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Abstract 

This paper addresses the problem of recovering relative structure, in the form of an invariant, from two views of a 3D 
scene. The invariant structure is computed without any prior knowledge of camera geometry, or internal calibration, 
and with the property that perspective and orthographic projections are treated alike, namely, the system makes no 
assumption regarding the existence of perspective distortions in the input images. 

We show that, given the location of epipoles, the projective structure invariant can be constructed from only four 
corresponding points projected from four non-coplanar points in space (like in the case of parallel projection). This 
result leads to two algorithms for computing projective structure. The first algorithm requires six corresponding 
points, four of which are assumed to be projected from four coplanar points in space. Alternatively, the second 
algorithm requires eight corresponding points, without assumptions of coplanarity of object points. 

Our study of projective structure is applicable to both structure from motion and visual recognition. We use 
projective structure to re-project the 3D scene from two model images and six or eight corresponding points with a 
novel view of the scene. The re-projection process is well-defined under all cases of central projection, including the 
case of parallel projection. 
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1 Introduction 

The problem we address in this paper is that of recover¬ 
ing relative, non-metric, structure of a three-dimensional 
scene from two images, taken from different viewing po¬ 
sitions. The relative structure information is in the form 
of an invariant that can be computed without any prior 
knowledge of camera geometry, and under all central pro¬ 
jections — including the case of parallel projection. The 
non-metric nature of the invariant allows the cameras to 
be internally uncalibrated (intrinsic parameters of cam¬ 
era are unknown). The unique nature of the invariant al¬ 
lows the system to make no assumptions about existence 
of perspective distortions in the input images. Therefore, 
any degree of perspective distortions is allowed, i.e., or¬ 
thographic and perspective projections are treated alike, 
or in other words, no assumptions are made on the size 
of Held of view. 

We envision this study as having applications both in 
the area of structure from motion and in the area of 
visual recognition. In structure from motion our contri¬ 
bution is an addition to the recent studies of non-metric 
structure from motion pioneered by Koenderink and Van 
Doom (1991) in parallel projection, followed by Faugeras 
(1992) and Mohr, Quan, Veillon & Boufama (1992) for 
reconstructing the projective coordinates of a scene up 
to an unknown projective transformation of 3D projec¬ 
tive space. Our approach is similar to Koenderink and 
Van Doom’s in the sense that we derive an invariant, 
based on a geometric construction, that records the 3D 
structure of the scene as a variation from two fixed ref¬ 
erence planes measured along the line of sight. Unlike 
Faugeras and Mohr et al. we do not recover the projec¬ 
tive coordinates of the scene, and, as a result, we use a 
smaller number of corresponding points: in addition to 
the location of epipoles we need only four correspond¬ 
ing points, coming from four non-coplanar points in the 
scene, whereas Faugeras and Mohr et al. require corre¬ 
spondences coming from five points in general position. 

The second contribution of our study is to visual recog¬ 
nition of 3D objects from 2D images. We show that our 
projective invariant can be used to predict novel views of 
the object, given two model views in full correspondence 
and a small number of corresponding points with the 
novel view. The predicted view is then matched against 
the novel input view, and if the two match, then the 
novel view is considered to be an instance of the same ob¬ 
ject that gave rise to the two model views stored in mem¬ 
ory. This paradigm of recognition is within the general 
framework of alignment (Fischler and Bolles 1981, Lowe 
1985, Ullman 1989, Huttenlocher and Ullman 1987) and, 
more specifically, of the paradigm proposed by Ullman 
and Basri (1989) that recognition can proceed using only 
2D images, both for representing the model, and when 
matching the model to the input image. We refer to the 
problem of predicting a novel view from a set of model 
views using a limited number of corresponding points, 
as the problem of re-projection. 

The problem of re-projection has been dealt with in 
the past primarily assuming parallel projection (Ull¬ 
man and Basri 1989, Koenderink and Van Doom 1991). 
For the more general case of central projection, Barret, 


Brill, Haag & Pyton (1991) have recently introduced a 
quadratic invariant based on the fundamental matrix of 
Longuet-Higgins (1981), which is computed from eight 
corresponding points. In Appendix E we show that 
their result is equivalent to intersecting epipolar lines, 
and therefore, is singular for certain viewing transfor¬ 
mations depending on the viewing geometry between the 
two model views. Our projective invariant is not based 
on an epipolar intersection, but is based directly on the 
relative structure of the object, and does not suffer from 
any singularities, a finding that implies greater stability 
in the presence of errors. 

The projective structure invariant, and the re¬ 
projection method that follows, is based on an exten¬ 
sion of Koenderink and Van-Doorn’s representation of 
affine structure as an invariant defined with respect to 
a reference plane and a reference point. We start by in¬ 
troducing an alternative affine invariant, using two ref¬ 
erence planes (section 5), and it can easily be extended 
to projective space. As a result we obtain a projective 
structure invariant (section 6). 

We show that the difference between the affine and 
projective case lie entirely in the location of the epipoles, 
i.e., given the location of epipoles both the affine and 
projective structures are constructed by linear methods 
using the information captured from four corresponding 
points projected from four non-coplanar points in space. 
In the projective case we need additional corresponding 
points — solely for the purpose of recovering the location 
of the epipoles (Theorem 1, section 6). 

We show that the projective structure invariant can 
be recovered from two views — produced by parallel or 
central projection — and six corresponding points, four 
of which are assumed to be projected from four coplanar 
points in space (section 7.1). Alternatively, the projec¬ 
tive structure can be recovered from eight corresponding 
points, without assuming coplanarity of object points 
(section 8.1). The 8-point method uses the fundamental 
matrix approach (Longuett-Higgins, 1981) for recover¬ 
ing the location of epipoles (as suggested by Faugeras, 
1992). 

Finally, we show that, for both schemes, it is possible 
to limit the viewing transformations to the group of rigid 
motions, i.e., it is possible to work with perspective pro¬ 
jection assuming the cameras are calibrated. The result, 
however, does not include orthographic projection. 

Experiments were conducted with both algorithms, 
and the results show that the 6-point algorithm is sta¬ 
ble under noise and under conditions that violate the 
assumption that four object points are coplanar. The 8- 
point algorithm, although theoretically superior because 
of lack of the coplanarity assumption, is considerably 
more sensitive to noise. 


2 Why not Classical SFM? 
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The work of Koenderink and Van Doom (1991) on affine 
structure from two orthographic views, and the work of 
Ullman and Basri (1989) on re-projection from two or¬ 
thographic views, have a clear practical aspect: it is 
known that at least three orthographic views are re¬ 
quired to recover metric structure, i.e., relative depth 



(Ullman 1979, Huang & Lee 1989, Aloimonos & Brown 
1989). Therefore, the suggestion to use affine structure 
instead of metric structure allows a recognition system 
to perform re-projection from two-model views (Ullman 
& Basri), and to generate novel views of the object pro¬ 
duced by affine transformations in space, rather than by 
rigid transformations (Koenderink & Van Doom). 

This advantage, of working with two rather than three 
views, is not present under perspective projection, how¬ 
ever. It is known that two perspective views are sufficient 
for recovering metric structure (Roach & Aggarwal 1979, 
Longuett-Higgins 1981, Tsai & Huang 1984, Faugeras & 
Maybank 1990). The question, therefore, is why look for 
alternative representations of structure, and new meth¬ 
ods for performing re-projection? 

There are three major problems in structure from mo¬ 
tion methods: (i) critical dependence on an orthographic 
or perspective model of projection, (ii) internal camera 
calibration, and (iii) the problem of stereo-triangulation. 

The first problem is the strict division between meth¬ 
ods that assume orthographic projection and methods 
that assume perspective projection. These two classes 
of methods do not overlap in their domain of applica¬ 
tion. The perspective model operates under conditions 
of significant perspective distortions, such as driving on 
a stretch of highway, requires a relatively large Held of 
view and relatively large depth variations between scene 
points (Adiv 1989, Dutta & Synder 1990, Tomasi 1991, 
Broida et al. 1990). The orthographic model, on the 
other hand, provides a reasonable approximation when 
the imaging situation is at the other extreme, i.e., small 
field of view and small depth variation between object 
points (a situation for which perspective schemes often 
break down). Typical imaging situations are at neither 
end of these extremes and, therefore, would be vulner¬ 
able to errors in both models. From the standpoint of 
performing recognition, this problem implies that the 
viewer has control over his field of view — a property 
that may be reasonable to assume at the time of model 
acquisition, but less reasonable to assume occurring at 
recognition time. 

The second problem is related to internal camera cal¬ 
ibration. The assumption of perspective projection in¬ 
cludes a distinguishable point, known as the principal 
point, which is at the intersection of the optical axis and 
the image plane. The location of the principal point is 
an internal parameter of the camera, which may deviate 
somewhat from the geometric center of the image plane, 
and therefore, may require calibration. Perspective pro¬ 
jection also assumes that the image plane is perpendicu¬ 
lar to the optical axis and the possibility of imperfections 
in the camera requires, therefore, the recovery of the two 
axes describing the image frame, and of the focal length. 
Although the calibration process is somewhat tedious, it 
is sometimes necessary for many of the available com¬ 
mercial cameras (Brown 1971, Faig 1975, Lenz and Tsai 
1987, Faugeras, Luong and Maybank 1992). The prob¬ 
lem of calibration is lesser under orthographic projection 
because the projection does not have a distinguishable 
ray; therefore any point can serve as an origin, however 
must still be considered because of the assumption that 


the image plane is perpendicular to the projecting rays. 

The third problem is related to the way shape is 
typically represented under the perspective projection 
model. Because the center of projection is also the ori¬ 
gin of the coordinate system for describing shape, the 
shape difference (e.g., difference in depth, between two 
object points), is orders of magnitude smaller than the 
distance to the scene, and this makes the computations 
very sensitive to noise. The sensitivity to noise is re¬ 
duced if images are taken from distant viewpoints (large 
base-line in stereo triangulation), but that makes the 
process of establishing correspondence between points in 
both views more of a problem, and hence, may make the 
situation even worse. This problem does not occur un¬ 
der the assumption of orthographic projection because 
translation in depth is lost under orthographic projec¬ 
tion, and therefore, the origin of the coordinate system 
for describing shape (metric and non-metric) is object- 
centered, rather than viewer-centered (Tomasi, 1991). 

These problems, in isolation or put together, make 
much of the reason for the sensitivity of structure from 
motion methods to errors. The recent work of Faugeras 
(1992) and Mohr et al. (1992) addresses the problem of 
internal calibration by assuming central projection in¬ 
stead of perspective projection. Faugeras and Mohr et 
al. then proceed to reconstruct the projective coordi¬ 
nates of the scene. Since projective coordinates are mea¬ 
sured relative to the center of projection, this approach 
does not address the problem of stereo-triangulation or 
the problem of uniformity under both orthographic and 
perspective projection models. 

3 Camera Model and Notations 

We assume that objects in the world are rigid and are 
viewed under central projection. In central projection 
the center of projection is the origin of the camera coor¬ 
dinate frame and can be located anywhere in projective 
space. In other words, the center of projection can be 
a point in Euclidean space or an ideal point (such as 
happens in parallel projection). The image plane is as¬ 
sumed to be arbitrarily positioned with respect to the 
camera coordinate frame (unlike perspective projection 
where it is parallel to the xy plane). We refer to this as a 
non-rigid camera configuration. The motion of the cam¬ 
era, therefore, consists of the translation of the center of 
projection, rotation of the coordinate frame around the 
new location of the center of projection, and followed by 
tilt, pan, and focal length scale of the image plane with 
respect to the new optical axis. This model of projection 
will also be referred to as perspective projection with an 
uncalibrated camera. 

We also include in our derivations the possibility of 
having a rigid camera configuration. A rigid camera is 
simply the familiar model of perspective projection in 
which the center of projection is a point in Euclidean 
space and the image plane is fixed with respect to the 
camera coordinate frame. A rigid camera motion, there¬ 
fore, consists of translation of the center of projection 
followed by rotation of the coordinate frame and focal 
length scaling. Note that a rigid camera implicitly as¬ 
sumes internal calibration, i.e., the optical axis pierces 
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Figure 1: Koenderink and Van Doom’s Affine Structure. 

through a fixed point in the image and the image plane 
is perpendicular to the optical axis. 

We denote object points in capital letters and image 
points in small letters. If P denotes an object point in 3D 
space, p.p'.p" denote its projections onto the first, sec¬ 
ond and novel projections, respectively. We treat image 
points as rays (homogeneous coordinates) in 3D space, 
and refer to the notation p = (x.y, 1) as the standard 
representation of the image plane. We note that the 
true coordinates of the image plane are related to the 
standard representation by means of a projective trans¬ 
formation of the plane. In case we deal with central 
projection, all representations of image coordinates are 
allowed, and therefore, without loss of generality we work 
with the standard representation (more on that in Ap¬ 
pendix A). 

4 Affine Structure: Koenderink and 
Van Doom’s Version 

The affine structure invariant described by Ivoendf^ink 
and Van Doom (1991) is based on a geometric con¬ 
struction using a single reference plane, and a reference 
point not coplanar with the reference plane. In affine 
geometry (induced by parallel projection), it is known 
from the fundamental theorem of plane projectivity, that 
three (non-collinear) corresponding points are sufficient 
to uniquely determine all other correspondences (see Ap¬ 
pendix A for more details on plane projectivity under 
affine and projective geometry). Using three correspond¬ 
ing points between two views provides us, therefore, with 
a transformation (affine transformation) for determining 
the location of all points of the plane passing through 
the three reference points in the second image plane. 

Let P be an arbitrary point in the scene projecting 
onto p,p' on the two image planes. Let P be the projec¬ 
tion of P onto the reference plane along the ray towards 
the first image plane, and let p' be the projection of P 
onto the second image plane (p 1 and p' coincide if P is 


on the reference plane). Note that the location of p' is 
known via the affine transformation determined by the 
projections of the three reference points. Finally, let 
Q be the fourth reference point (not on the reference 
plane). Using a simple geometric drawing, the affine 
structure invariant is derived as follows. 

Consider Figure 1. The projections of the reference 
point Q and an arbitrary point of interest P form two 
similar trapezoids: PPp'p' and QQq'q'. From similarity 
of trapezoids we have, 

_ | P-P\ _ \p'-p'\ 

lp \Q-Q\ Is'-sT 

By assuming that q,q' is a given corresponding point, we 
obtain a shape invariant that is invariant under parallel 
projection (the object points are fixed while the camera 
changes the location and position of the image plane 
towards the projecting rays). 

Before we extend this result to central projection by 
using projective geometry, we first describe a different 
affine invariant using two reference planes, rather than 
one reference plane and a reference point. The new affine 
invariant is the one that, will be applied later to central 
projection. 

5 Affine Structure Using Two 
Reference Planes 

We make use of the same information — the projections 
of four non-coplanar points — to set up two reference 

planes. Let Pj, j = 1,_,4, be the four non-coplanar 

reference points in space, and let pj ■■ -- p'j be their ob¬ 

served projections in both views. The points P1.P2.P3 
and P2.P3.P4 lie on two different planes, therefore, we 
can account for the motion of all points coplanar with 
each of these two planes. Let P be a point of interest, 
not coplanar with either of the reference planes, and let 
P and P be its projections onto the two reference planes 
along the ray towards the first view. 

Consider Figure 2. The projection of P. P and P onto 
p',p' and p' respectively, gives rise to two similar trape¬ 
zoids from which we derive the following relation: 

_ \P~P\ _ \P'-P'\ 
ap | p-p\ I p'-p'Y 

The ratio a p is invariant under parallel projection. There 
is no particular advantage for preferring a p ■ over j p as 
a measure of affine structure, but as will be described 
below, this new construction forms the basis for extend¬ 
ing affine structure to projective structure, whereas the 
single reference plane construction does not (see Ap¬ 
pendix D for proof). 

In the projective plane, we need four coplanar points 
to determine the motion of a reference plane. We show 
that, given the epipoles, only three corresponding points 
for each reference plane are sufficient for recovering the 
associated projective transformations induced by those 
planes. Altogether, the construction provides us with 
four points along each epipola.r line. The similarity of 
trapezoids in the affine case turns, therefore, into a cross- 
ratio in the projective case. 
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Figure 3: Definition of projective shape as the cross ratio 
of p',p',p', \j. 



Figure 2: Affine structure using two reference planes. 

This leads to the result (Theorem 1) that, in addition 
to the epipoles, only four corresponding points, projected 
from four non-coplanar points in the scene, are sufficient 
for recovering the projective structure invariant for all 
other points. The epipoles can be recovered by either 
extending the Ivoenderink and Van Doom (1991) con¬ 
struction to projective space using six points (four of 
which are assumed to be coplanar), or by using other 
methods, notably those based on the Longuet.-Higgins 
fundamental matrix. This leads to projective structure 
from eight points in general position. 

6 Projective Structure 

We assume for now that the location of both epipoles is 
known, and we will address the problem of finding the 
epipoles later. The epipoles, also known as the foci of ex¬ 
pansion, are the intersections of the line in space connect¬ 
ing the two centers of projection and the image planes. 
There are two epipoles, one on each image plane — the 
epipole on the second image we call the left epipole, and 
the epipole on the first image we call the right epipole. 
The image lines emanating from the epipoles are known 
as the epipolar lines. 

Consider Figure 3 which illustrates the two reference 
plane construction, defined earlier for parallel projection, 
now displayed in the case of central projection. The 
left epipole is denoted by V;, and because it is on the 
line VjVj) (connecting the two centers of projection), the 
line P\ j projects onto the epipolar line p'V. Therefore, 
the points P and P project onto the points p' and p ', 
which are both on the epipolar line p'V. The points 
p' ,p' ,p' and V] are collinear and projectively related to 
P, P , P , Vi, and therefore have the same cross-ratio: 

_ \P~P\ \Vi~p\ _ \P'-P'\ \Vi~p'\ 
ap \p~p\ |Vi-P| \p'-p'\ I^i-pT 

Note that when the epipole V; becomes an ideal point 
(vanishes along the epipolar line), then a p is the same 


as the affine invariant defined in section 5 for parallel 
projection. 

The cross-ratio a p is a direct extension of the affine 
structure invariant defined in section 5 and is referred 
to as projective structure. We can use this invariant to 
reconstruct any novel view of the object (taken by a 
non-rigid camera) without ever recovering depth or even 
projective coordinates of the object. 

Having defined the projective shape invariant, and as¬ 
suming we still are given the locations of the epipoles, 
we show next how to recover the projections of the two 
reference planes onto the second image plane, i.e., we 
describe the computations leading to p' and p'. 

Since we are working under central projection, we 
need to identify four coplanar points on each reference 
plane. In other words, in the projective geometry of the 
plane, four corresponding points, no three of which are 
collinear, are sufficient to determine uniquely all other 
correspondences (see Appendix A, for more details). We 
must, therefore, identify four corresponding points that 
are projected from four coplanar points in space, and 
then recover the projective transformation that accounts 
for all other correspondences induced from that plane. 
The following proposition states that the corresponding 
epipoles can be used as a fourth corresponding point for 
any three corresponding points selected from the pair of 
images. 

Proposition 1 A projective transformation, A, that is 
determined from three arbitrary, non-collinear, corre¬ 
sponding points and the corresponding epipoles, is a pro¬ 
jective transformation of the plane passing through the 
three object points which project onto the correspond¬ 
ing image points. The transformation A is an induced 
epipolar transformation, i.e., the ray Ap intersects the 
epipolar line p'\j for any arbitrary image point p and its 
corresponding point p'. 

Comment: An epipolar transformation F is a mapping 
between corresponding epipolar lines and is determined 
(not uniquely) from three corresponding epipolar lines 
and the epipoles. The induced point transformation is 
E = (F 1-1 )* (induced from the point/line duality of pro- 


jective geometry, see Appendix C for more details on 
epipolar transformations). 

Proof: Let pj <->■ pb, j = 1, 2, 3, be three arbitrary 

corresponding points, and let Vi and V r denote the left 
and right epipoles. First note that the four points pj and 
V r are projected from four coplanar points in the scene. 
The reason is that the plane defined by the three object 
points Pj intersects the line V 1 V 2 connecting the two 
centers of projection, at a point — regular or ideal. That 
point projects onto both epipoles. The transformation 
A, therefore, is a projective transformation of the plane 
passing through the three object points Pi, P2, P3. Note 
that A is uniquely determined provided that no three of 
the four points are collinear. 

Let fip' = Ap for some arbitrary point p. Because lines 
are projective invariants, any point along the epipolar 
line pV r must project onto the epipolar line p'V\. Hence, 
A is an induced epipolar transformation. [] 

Given the epipoles, therefore, we need just three points 
to determine the correspondences of all other points 
coplanar with the reference plane passing through the 
three corresponding object points. The transformation 
(collineation) A is determined from the following equa¬ 
tions: 

Apj = PjPj , j = 1,2,3 
AVr=pV U 

where p, pj are unknown scalars, and A 33 = 1. One 
can eliminate p, pj from the equations and solve for the 
matrix A from the three corresponding points and the 
corresponding epipoles. That leads to a linear system 
of eight equations, and is described in more detail in 
Appendix A. 

If Pi, P2, P3 define the first reference plane, the trans¬ 
formation A determines the location of p' for all other 
points p (p 1 and p' coincide if P is coplanar with the first 
reference plane). In other words, we have that p' = Ap. 
Note that p' is not necessarily a point on the second im¬ 
age plane, but it is on the line V 2 P- We can determine 
its location on the second plane by normalizing Ap such 
that its third component is set to 1 . 

Similarly, let P 2 ,-P 3 ,-P 4 define the second reference 
plane (assuming the four object points Pj, j = 1, ...,4, 
are non-coplanar). The transformation E is uniquely 
determined by the equations 

E Pj = PjP'j, 3 = 2,3,4 

EV r = pV\, 

and determines all other correspondences induced by the 
second reference plane (we assume that no three of the 
four points used to determine E are collinear). In other 
words, Ep determines the location of p' up to a scale 
factor along the ray V2 P. 

Instead of normalizing Ap and Ep we compute a p 
from the cross-ratio of the points represented in homo¬ 
geneous coordinates, i.e., the cross-ratio of the four rays 
I / 2 p / , I / 2 p / , V2i5 / , as follows: Let the rays p',V\ be 

represented as a linear combination of the rays p' = Ap 
and p' = Ep, i.e., 

p' = p' + kp' 

Vi = p> + k'p', 


then a p = jj (see Appendix B for more details). This 
way of computing the cross-ratio is preferred over the 
more familiar cross-ratio of four collinear points, because 
it enables us to work with all elements of the projective 
plane, including ideal points (a situation that arises, for 
instance, when epipolar lines are parallel, and in general 
under parallel projection). 

We have therefore shown the following result: 

Theorem 1 In ihe case where ihe location of epipoles 
are known, then four corresponding points, coming from 
four non-coplanar points in space, are sufficient for com¬ 
puting the projective structure invariant a p for all other 
points in space projecting onto corresponding points m 
both mews, for all central projections, including parallel 
projection. 

This result shows that the difference between parallel 
and central projection lies entirely on the epipoles. In 
both cases four non-coplanar points are sufficient for ob¬ 
taining the invariant, but in the parallel projection case 
we have prior knowledge that both epipoles are ideal, 
therefore they are not required for determining the trans¬ 
formations A and E (in other words, A and E are affine 
transformations, more on that in Section 7.2). 

Another point to note with this result is that the 
minimal number of corresponding points needed for re¬ 
projection is smaller than the previously reported num¬ 
ber (Faugeras 1992, Mohr et al. 1992) for recovering 
the projective coordinates of object points. Faugeras 
shows that five corresponding points coming from five 
points in general position (i.e., no four of them are copla¬ 
nar) can be used, together with the epipoles, to recover 
the projective coordinates of all other points in space. 
Because the projective structure invariant requires only 
four points, this implies that re-projection is done more 
directly than through full reconstruction of projective 
coordinates, and therefore is likely to be more stable. 

We next discuss algorithms for recovering the loca¬ 
tion of epipoles. The problem of recovering the epipoles 
is well known and several approaches have been sug¬ 
gested in the past (Longuet-Higgins and Prazdny 1980, 
Rieger-Lawton 1985, Faugeras and Maybank 1990, Hil¬ 
dreth 1991, Faugeras 1992, Faugeras, Luong and May- 
bank 1992). We start with a method that requires six 
corresponding points (two additional points to the four 
we already have). The method is a direct extension of the 
Koenderink and Van Doom (1991) construction in par¬ 
allel projection, and was described earlier by Lee (1988) 
for the purpose of recovering the translational compo¬ 
nent of camera motion. 

The second algorithm for locating the epipoles is 
adopted from Faugeras (1992) and is based on the fun¬ 
damental matrix of Longuet-Higgins (1981). 

7 Epipoles from Six Points 

We can recover the correspondences induced from the 
first reference plane by selecting four corresponding 
points, assuming they are projected from four coplanar 
object points. Let pj = (xj,yj, 1) and p'j = (x'j,y'j, 1) 
and j = 1, ..., 4 represent the standard image coordinates 
of the four corresponding points, no three of which are 
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Figure 4: The geometry of locating the left epipole using 
two points out of the reference plane. 

collinear, in both projections. Therefore, the transfor¬ 
mation A is uniquely determined by the following equa¬ 
tions, 

PjPj = '/' • 

Let p' = Ap be the homogeneous coordinate representa¬ 
tion of the ray \SP, and let p _1 = A~ 1 p l . 

Having accounted for the motion of the reference 
plane, we can easily find the location of the epipoles (in 
standard coordinates). Given two object points P 5 , Pe 
that are not on the reference plane, we can find both 
epipoles by observing that p' is on the left epipolar 
line, and similarly that p _1 is on the right epipolar line. 
Stated formally, we have the following proposition: 

Proposition 2 The left epipole, denoted by Vi, is at the 
intersection of the line p' 5 p' 5 and the line p' 6 p' 6 . Sim ilarly, 
the right epipole, denoted by V r , is at the intersection of 
P 5 P 5 1 andpepf 1 . 

Proof: It is sufficient to prove the claim for one of the 
epipoles, say the left epipole. Consider Figure 4 which 
describes the construction geometrically. By construc¬ 
tion, the line P 5 P 5 V 1 projects to the line p' 5 p ' 5 via IS 
(points and lines are projective invariants) and therefore 
they are coplanar. In particular, Vj projects to V; which 
is located at the intersection of ptjftj and IjlT Simi¬ 
larly, the line p>gPg intersects If IS at V). Finally, V; and 
V\ must coincide because the two lines p' 5 p' 5 and PqPq are 
coplanar (both are on the image plane). [] 

Algebraically, we can recover the ray Vi\S, or V; up to 
a scale factor, using the following formula: 

Vi = (.Pe x P 5 ) x (Pe x Pe)- 

Note that V; is defined with respect to the standard coor¬ 
dinate frame of the second camera. We treat the epipole 
V] as the ray VjVj> with respect to VS, and the epipole 
V r as the same ray but with respect to Vj. Note also 
that the third component of V; is zero if epipolar lines 
are parallel, i.e., V; is an ideal point in projgft.ive terms 
(happening under parallel projection, or when the non- 
rigid camera motion brings the image plane to a position 
where it. is parallel to the line VjVj). 


In the case where more than two epipolar lines are 
available (such as when more than six corresponding 
points are available), one can find a least-squares so¬ 
lution for the epipole by using a principle component 
analysis, as follows. Let B be a k x 3 matrix, where 
each row represents an epipolar line. The least squares 
solution to V] is the unit eigenvector associated with the 
smallest eigenumber of the 3 x 3 matrix 5* B. Note that 
this can be done analytically because the characteristic 
equation is a cubic polynomial. 

Altogether, we have a six point algorithm for recover¬ 
ing both the epipoles, and the projective structure a p , 
and for performing re-projection onto any novel view. 
We summarize in the following section the 6-point, algo¬ 
rithm. 

7.1 Re-projection Using Projective Structure: 

6-point Algorithm 

We assume we are given two model views of a. 3D object, 
and that, all points of interest, are in correspondence. We 
assume these correspondences can be based on measures 
of correlation, as used in optical-flow methods (see also 
Sha.shua. 1991, Ba.chelder A Ullma.n 1992 for methods for 
extracting correspondences using combination of optical 
flow and a.ffine geometry). 

Given a. novel view we extract, six corresponding points 

(with one of the model views): pj ■■ -- pt ■■ -- p ”, 

j = 1, ..., 6. We assume the first, four points are projected 
from four coplanar points, and the other corresponding 
points are projected from points that, are not. on the ref¬ 
erence plane. Without, loss of generality, we assume the 
standard coordinate representation of the image planes, 
i.e., the image coordinates are embedded in a. 3D vec¬ 
tor whose third component, is set. to 1 (see Appendix A). 
The computations for recovering projective shape and 
performing re-projection are described below. 

1: Recover the transformation A that, satisfies pjpt = 
Apj, j = 1,...,4. This requires setting up a. linear 
system of eight, equations (see Appendix A). Apply 
the transformation to all points p, denoting p' = Ap. 
Also recover the epipoles Vj = (p ' 5 x p ' 5 ) x (p ' 6 x p ' 6 ) 
and V r = (P 5 x A _ V 5 ) x (Pe x A _ V 6 ). 

2: Recover the transformation E that satisfies p \j = 
EVr and pjp'- = Epj, j = 4, 5, 6. 

3: Compute the cross-ratio of the points p 1 , Ap, Ep, Vj, 
for all points p and denote that by a p (see Ap¬ 
pendix B for details on computing the cross-ratio 
of four rays). 

4: Perform step 1 between the first, and novel view: 
recover A that satisfies pjp” = Apj, j = 1, ..., 4, 

apply A to all points p and denote that by p" = Ap, 
recover the epipoles Vj„ = (p'f xp'f) x (p” x p”) and 
V-n = (P5 x -4 _1 pg ) X (p 6 X 

5: Perform step 2 between the first, and novel view: 
Recover the. transformation E that satisfies pVi„ = 
EV rn and pjp” = Epj, j = 4, 5, 6. 

6: For every point, p, recover p" from the cross-ratio a p 
and the three rays Ap, Ep, Vj„. Normalize p" such 



that its third coordinate is set to 1. 

The entire procedure requires setting up a linear sys¬ 
tem of eight equations four times (Step 1,2,4,5) and com¬ 
puting cross-ratios (linear operations as well). 

We discuss below an important property of this pro¬ 
cedure which is the transparency with respect to projec¬ 
tion model: central and parallel projection are treated 
alike — a property which has implications on stability 
of re-projection no matter what degree of perspective 
distortions are present in the images. 

7.2 The Case of Parallel Projection 

The construction for obtaining projective structure is 
well defined for all central projections, including the case 
where the center of projection is an ideal point, i.e., such 
as happening with parallel projection. The construction 
has two components: the first component has to do with 
recovering the epipolar geometry via reference planes, 
and the second component is the projective invariant a p . 

From Proposition 1 the projective transformations A 
and E can be uniquely determined from three corre¬ 
sponding points and the corresponding epipoles. If both 
epipoles are ideal, the transformations become affine 
transformations of the plane (an affine transformation 
separates ideal points from Euclidean points). All other 
possibilities (both epipoles are Euclidean, one epipole 
Euclidean and the other epipole ideal) lead to projective 
transformations. Because a projectivity of the projec¬ 
tive plane is uniquely determined from any four points 
on the projective plane (provided no three are collinear), 
the transformations A and E are uniquely determined 
under all situations of central projection — including 
parallel projection. 

The projective invariant a p is the same as the one 
defined under parallel projection (Section 5) — affine 
structure is a particular instance of projective structure 
in which the epipole Vi is an ideal point. By using the 
same invariant for both parallel and central projection, 
and because all other elements of the geometric construc¬ 
tion hold for both projection models, the overall system 
is transparent to the projection model being used. 

The first implication of this property has to do with 
stability. Projective structure does not require any per¬ 
spective distortions, therefore all imaging situations can 
be handled — wide or narrow Held of views. The second 
implication is that 3D visual recognition from 2D images 
can be achieved in a uniform manner with regard to the 
projection model. For instance, we can recognize (via re¬ 
projection) a perspective image of an object from only 
two orthographic model images, and in general any com¬ 
bination of perspective and orthographic images serving 
as model or novel views is allowed. 

The results so far required prior knowledge (or as¬ 
sumption) that four of the corresponding points are com¬ 
ing from coplanar points in space. This requirement can 
be avoided, using two more corresponding points (mak¬ 
ing eight points overall), and is described in the next 
section. 


8 Epipoles from Eight Points 

We adopt a recent algorithm suggested by Faugeras 
(1992) which is based on Longuet-Higgins’ (1981) funda¬ 
mental matrix. The method is very simple and requires 
eight corresponding points for recovering the epipoles. 

Let F be an epipolar transformation, i.e., FI = pi', 
where l = V r x p and l' = Vi x p' are corresponding 
epipolar lines. We can rewrite the projective relation of 
epipolar lines using the matrix form of cross-products: 

F(V r x p) = F[V r ]p = pi ', 

where [V r ] is a skew symmetric matrix (and hence has 
rank 2). From the point/line incidence property we have 
that p' ■ V = 0 and therefore, p ft F[V r \p = 0, or p ft Hp = 0 
where H = TfVj.]- The matrix H is known as the fun¬ 
damental matrix introduced by Longuet-Higgins (1981), 
and is of rank 2. One can recover H (up to a scale factor) 
directly from eight corresponding points, or by using a 
principle components approach if more than eight points 
are available. Finally, it is easy to see that 

HV r = 0, 

and therefore the epipole V r can be uniquely recovered 
(up to a scale factor). Note that the determinant of 
the first principle minor of H vanishes in the case where 
V r is an ideal point, i.e., /in /122 — /j- 12^21 = 0. In that 
case, the x, y components of V r can be recovered (up to 
a scale factor) from the third row of H. The epipoles, 
therefore, can be uniquely recovered under both central 
and parallel projection. We have arrived at the following 
theorem: 

Theorem 2 In ihe case where we have eight correspond¬ 
ing points of two mews taken under central projection 
(including parallel projection), four of these points, com¬ 
ing from four non-coplanar points in space, are suffi¬ 
cient for computing the projective structure invariant a p 
for the remaining four points and for all other points in 
space projecting onto corresponding points m both mews. 

We summarize in the following section the 8-point 
scheme for reconstructing projective structure and per¬ 
forming re-projection onto a novel view. 

8.1 8-point Re-projection Algorithm 

We assume we have eight corresponding points between 

two model views and the novel view, pj <->■ p( <->■ p'j, 

j = 1, ..., 8, and that the first four points are coming from 
four non-coplanar points in space. The computations 
for recovering projective structure and performing re¬ 
projection are described below. 

1: Recover the fundamental matrix H (up to a scale 
factor) that satisfies p'- Hpj, j = 1, ..., 8 . The right 
epipole V r then satisfies HV r = 0. Similarly, the 
left epipole is recovered from the relation p 1 Hp' and 
HVj = 0. 

2: Recover the transformation A that satisfies pVi = 
AV r and pjp) = Apj, j = 1,2,3. Similarly, recover 
the transformation E that satisfies pVi = EV r and 
pjp'j = Epj , j = 2,3,4. 



3: Compute a p as the cross-ratio of p', Ap, Ep, \j, for 
all points p. 

4: Perform step 1 and 2 between the first, and novel 
view: recover the epipoles V rn ,\j n , and the trans¬ 
formations A and E. 

5: For every point p, recover p" from the cross-ratio a p 
and the three rays Ap, Ep,\j n . Normalize p" such 
that its third coordinate is set to 1. 

We discuss next the possibility of working with a rigid 
camera (i.e., perspective projection and calibrated cam¬ 
era.). 


9 The Rigid Camera Case 


The advantage of the non-rigid camera, model (or the 
central projection model) used so far is that, images can 
be obtained from unca.libra.t.ed cameras. The price paid 
for this property is that, the images that, produce the 
same projective structure invariant, (equivalence class of 
images of the object.) can be produced by applying non- 
rigid transformations of the object, in addition to rigid 
transformations. 

In this section we show that. it. is possible to verify 
whether the images were produced by rigid transfor¬ 
mations, which is equivalent, to working with perspec¬ 
tive projection assuming the cameras are internally cal¬ 
ibrated. This can be done for both schemes presented 
above, i.e., the 6-point, and 8-point, algorithms. In both 
cases we exclude orthographic projection and assume 
only perspective projection. 

In the perspective case, the second reference plane is 
the image plane of the first, model view, and the trans¬ 
formation for projecting the second reference plane onto 
any other view is the rotational component, of camera, 
motion (rigid transformation). We recover the rota¬ 
tional component, of camera, motion by adopting a. re¬ 
sult. derived by Lee (1988), who shows that the rota¬ 
tional component, of motion can be uniquely determined 
from two corresponding points and the corresponding 
epipoles. We then show that, projective structure can be 
uniquely determined, up to a. uniform scale factor, from 
two calibrated perspective images. 

Propositions (Lee, 1988) In the case of perspective 
projection, the rotational component of camera motion 
can be uniquely recovered, up to a reflection, from two 
corresponding points and the corresponding epipoles. 
The reflection component can also be uniquely deter¬ 
mined by using a third corresponding point. 


Proof: Let. /'■ = p'■ x \j and lj = pj x ly, j = 1,2 
be two corresponding epipolar lines. Because R is an or¬ 
thogonal matrix, it. leaves vector magnitudes unchanged, 
and we can normalize the length of l[, 1' 2 , \j to be of the 
same length of l\,l 2 , V r , respectively. We have therefore, 
= Rlj, j = 1,2, and \j = R\ y, which is sufficient, for 
determining R up to a. reflection. Note that, because R 
is a. rigid transformation, it. is both an epipolar and an 
induced epipolar transformation (the induced transfor¬ 
mation E is determined by E = (f? -1 )*, therefore E = R 
because R is an orthogonal matrix). 
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Figure 5: Illustration that projective shape can be re¬ 
covered only up to a. uniform scale (see text). 


To determine the reflection component., it. is sufficient. 

to observe a. third corresponding point. p 3 ■■ -- p' 3 . The 

object, point. P 3 is along the ray Vjy >3 and therefore has 
the coordinates a 3 p 3 (w.r.t. the first, camera, coordinate 
frame), and is also along the ray V 2 p 3 and therefore has 
the coordinates a' 3 p' 3 (w.r.t.. the second camera, coordi¬ 
nate frame). We note that, the ratio between 0-3 and 
0-3 is a. positive number. The change of coordinates is 
represented by: 

/3V r + cm 3 Rps = a 3 p 3 , 

where ft is an unknown constant.. If we multiply both 
sides of the equation by j = 1,2,3, the term ftV r 
drops out., because ly is incident, to all left, epipolar lines, 
and after substituting /j with /'■ R, we are left, with, 

a 3 lj -Ps = a' 3 lj -Ps, 

which is sufficient, for determining the sign of /'•. [] 

The rotation matrix R can be uniquely recovered from 
any three corresponding points and the corresponding 
epipoles. Projective structure can be reconstructed by 
replacing the transformation E of the second reference 
plane, with the rigid transformation R (which is equiv¬ 
alent. to treating the first, image plane as a. reference 
plane). We show next. that, this can lead to projective 
structure up to an unknown uniform scale factor (unlike 
the non-rigid camera, case). 

Proposition 4 In the perspective case, the projective 
shape constant ay can be determined, from two views, 
at most up to a uniform scale factor. 

Proof: Consider Figure 5, and lei the effective trans¬ 
lation be ly — V s = k(V 2 — V 1 ), which is the true trans¬ 
lation scaled by an unknown factor k. Projective shape, 
ay, remains fixed if the scene and the focal length of the 
first, view are scaled by k: from similarity of triangles we 
have, 

k = V. ~ V 2 = Ps ~ V. = fs_ 

I j - V 2 p - V 1 1 

= Ps - V, = Ps - V 2 
~ P-\j ~ P-V 2 
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Figure 6: The basic object configuration for the experi¬ 
mental set-up. 

where f s is the scaled focal length of the first, view. Since 
the magnitude of the translation along the line Vj V5 is 
irrecoverable, we can assume it is null, and compute a p 
as the cross-ratio of p', Ap , Rp , V; which determines pro¬ 
jective structure up to a uniform scale. [] 

Because a p is determined up to a uniform scale, we 
need an additional point in order to establish a common 
scale during the process of re-projection (we can use one 
of the existing six or eight points we already have). We 
obtain, therefore, the following result: 

Theorem 3 In the perspective case, a rigid re- 
projection from two model vie ws onto a novel vie w is pos¬ 
sible, using four corresponding points coming from four 
non-coplanar points, and the corresponding epipoles. 
The projective structure computed from two perspective 
images, is invariant up to an overall scale factor. 

Orthographic projection is excluded from this result 
because it is well known that the rotational component 
cannot be uniquely determined from two orthographic 
views (Ullma.n 1979, Huang and Lee 1989, Aloimonos 
and Brown 1989). To see what happens in the case of 
parallel projection note that the epipoles are vectors on 
the xy plane of their coordinate systems (ideal points), 
and the epipola.r lines are two vectors perpendicular to 
the epipole vectors. The equation RV r = \j takes care 
of the rotation in plane (around the optical axis). The 
other two equations Rlj = /'■, j = 1,2, take care only 
of rotation around the epipola.r direction — rotation 
around an axis perpendicular to the epipola.r direction 
is not. accounted for. The equations for solving for R 
provide a. non-singular system of equations but. do pro¬ 
duce a. rotation matrix with no rotational components 
around an axis perpendicular to the epipola.r direction. 

10 Simulation Results Using Synthetic 
Objects 

We ran simulations using synthetic objects to illustrate 
thei.ffe-project.ion process using the 6-point, scheme under 
various imaging situations. We also tested the robust¬ 
ness of the re-projection method under various types of 
noise. Because the 6-point, scheme requires that, four of 


the corresponding points be projected from four copla.¬ 
na.r points in space, it. is of special interest, to see how 
the method behaves under conditions that, violate this 
assumption, and under noise conditions in general. The 
stability of the 8-point, algorithm largely depends on the 
method for recovering the epipoles. The method adopted 
from Fa.ugera.s (1992), described in Section 8, based on 
the fundamental matrix, tends to be very sensitive to 
noise if the minimal number of points (eight, points) are 
used. We have, therefore, focused the experimental error 
analysis on the 6-point, scheme. 

Figure 6 illustrates the experimental set-up. The ob¬ 
ject. consists of 26 points in space arranged in the follow¬ 
ing manner: 14 points are on a. plane (reference plane) 
ortho-parallel to the image plane, and 12 points are out. 
of the reference plane. The reference plane is located 
two focal lengths a.wa.y from the center of projection (fo¬ 
cal length is set. to 50 units). The depth of out.-of-pla.ne 
points varies randomly between 10 to 25 units a.wa.y from 
the reference plane. The x,y coordinates of all points, 
except, the points Pi,...,Pe, vary randomly between 0 
— 240. The, ‘privileged’ points Pi,...,Pq have x,y co¬ 
ordinates that, place these points all around the object, 
(clustering privileged points together will inevitably con¬ 
tribute to instability). 

The first, view is simply a. perspective projection of the 
object. The second view is a. result, of rotating the object, 
around the point. (128, 128, 100) with an axis of rotation 
described by the unit, vector (0.14,0.7,0.7) by an an¬ 
gle of 29 degrees, followed by a. perspective projection 
(note that rotation about, a point, in space is equivalent, 
to rotation about, the center of projection followed by 
translation). The third (novel) view is constructed in a. 
similar manner with a. rotation around the unit, vector 
(0.7,0.7,0.14) by an angle of 17 degrees. Figure 7 (first, 
row) displays the three views. Also in Figure 7 (second 
row) we show the result, of applying the transformation 
due to the four copla.na.r points pi, ...,P 4 (Step 1, see Sec¬ 
tion 7.1) to all points in the first, view. We see that all 
the copla.na.r points are aligned with their correspond¬ 
ing points in the second view, and all other points are 
situated along epipola.r lines. The display on the right, 
in the second row shows the final re-projection result. (8- 
point. and 6-point, methods produce the same result). All 
points re-projected from the two model views are accu¬ 
rately (noise-free experiment.) aligned with their corre¬ 
sponding points in the novel view. 

The third row of Figure 7 illustrates a. more challeng¬ 
ing imaging situation (still noise-free). The second view 
is ort.hogra.phica.lly projected (and scaled by 0.5) follow¬ 
ing the same rotation and translation as before, and the 
novel view is a. result, of a. central projection onto a. tilted 
image plane (rotated by 12 degrees around a. copla.na.r 
axis parallel to the i’-axis). We have: therefore the situ¬ 
ation of recognizing a. non-rigid perspective projection 
from a. novel viewing position, given a. rigid perspec¬ 
tive projection and a. rigid orthographic projection from 
two model viewing positions. The 6-point, re-projection 
scheme was applied with the result, that, all re-projected 
points are in accurate alignment, with their correspond¬ 
ing points in the novel view. Identical results were ob- 
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Figure 7: Illustration of Re-projection. Row 1 (left to right): Three views of the object, two model views and a 
novel view, constructed by rigid motion following perspective projection. The filled dots represent p\, ...,P 4 (coplanar 
points). Row 2: Overlay of the second view and the first view following the transformation due to the reference 
plane (Step 1, Section 7.1). All coplanar points are aligned with their corresponding points, the remaining points are 
situated along epipolar lines. The righthand display is the result of re-projection — the re-projected image perfectly 
matches the novel image (noise-free situation). Row 3: The lefthand display shows the second view which is now 
orthographic. The middle display shows the third view which is now a perspective projection onto a tilted image 
plane. The righthand display is the result of re-projection which perfectly matches the novel view. 
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served with the 8-point algorithms. 

The remaining experiments, discussed in the follow¬ 
ing sections, were done under various noise conditions. 
We conducted three types of experiments. The first ex¬ 
periment tested the stability under the situation where 
Pi,..., Pi are non-coplanar object points. The second 
experiment tested stability under random noise added 
to all image points in all views, and the third experi¬ 
ment tested stability under the situation that less noise 
is added to the privileged six points, than to other points. 

10.1 Testing Deviation from Coplanarity 

In this experiment we investigated the effect of translat¬ 
ing Pi along the optical axis (of the first camera position) 
from its initial position on the reference plane (z = 100) 
to the farthest depth position (z = 125), in increments 
of one unit at a time. The experiment was conducted us¬ 
ing several objects of the type described above (the six 
privileged points were fixed, the remaining points were 
assigned random positions in space in different trials), 
undergoing the same motion described above (as in Fig¬ 
ure 7, first row). The effect of depth translation to the 
level z = 125 on the location of pi is a shift of 0.93 pix¬ 
els, on Pi is 1.58 pixels, and on the location of p'[ is 3.26 
pixels. Depth translation is therefore equivalent to per¬ 
turbing the location of the projections of Pi by various 
degrees (depending on the 3D motion parameters). 

Figure 8 shows the average pixel error in re-projection 
over the entire range of depth translation. The average 
pixel error was measured as the average of deviations 
from the re-projected point to the actual location of the 
corresponding point in the novel view, taken over all 
points. Figure 8 also displays the result of re-projection 
for the case where Pi is at z = 125. The average error 
is 1.31, and the maximal error (the point with the most 
deviation) is 7.1 pixels. The alignment between the re¬ 
projected image and the novel image is, for the most 
part, fairly accurate. 

10.2 Situation of Random Noise to all Image 
Locations 

We next add random noise to all image points in all 
three views (Pi is set back to the reference plane). This 
experiment was done repeatedly over various degrees of 
noise and over several objects. The results shown here 
have noise between 0-1 pixels randomly added to the x 
and y coordinates separately. The maximal perturbation 
is therefore \/2, and because the direction of perturba¬ 
tion is random, the maximal error in relative location is 
double, i.e., 2.8 pixels. Figure 9 shows the average pixel 
errors over 10 trials (one particular object, the same mo¬ 
tion as before). The average error fluctuates around 1.6 
pixels. Also shown is the result of re-projection on a typ¬ 
ical trial with average error of 1.05 pixels, and maximal 
error of 5.41 pixels. The match between the re-projected 
image and the novel image is relatively good considering 
the amount of noise added. 

10.3 Random Noise Case 2 

A more realistic situation occurs when the magnitude of 
noise associated with the privileged six points is much 


lower than the noise associated with other points, for 
the reason that we are interested in tracking points of 
interest that are often associated with distinct inten¬ 
sity structure (such as the tip of the eye in a picture 
of a face). Correlation methods, for instance, are known 
to perform much better on such locations, than on ar¬ 
eas having smooth intensity change, or areas where the 
change in intensity is one-dimensional. We therefore ap¬ 
plied a level of 0-0.3 perturbation to the x and y coor¬ 
dinates of the six points, and a level of 0-1 to all other 
points (as before). The results are shown in Figure 10. 
The average pixel error over 10 trials fluctuates around 
0.5 pixels, and the re-projection shown for a typical trial 
(average error 0.52, maximal error 1.61) is in relatively 
good correspondence with the novel view. With larger 
perturbations at a range of 0-2, the algorithm behaves 
proportionally well, i.e., the average error over 10 trials 
is 1.37. 

11 Summary 

In this paper we focused on the problem of recovering 
relative, non-metric, structure from two views of a 3D 
object. Specifically, the invariant structure we recover 
does not require internal camera calibration, does not 
involve full reconstruction of shape (Euclidean or pro¬ 
jective coordinates), and treats parallel and central pro¬ 
jection as an integral part of one unified system. We 
have also shown that the invariant can be used for the 
purposes of visual recognition, within the framework of 
the alignment approach to recognition. 

The study is based on an extension of Koenderink and 
Van Doom’s representation of affine structure as an in¬ 
variant defined with respect to a reference plane and 
a reference point. We first showed that the KV affine 
invariant cannot be extended directly to a projective in¬ 
variant (Appendix D), but there exists another affine in¬ 
variant, described with respect to two reference planes, 
that can easily be extended to projective space. As a 
result we obtained the projective structure invariant. 

We have shown that the difference between the affine 
and projective case lie entirely in the location of epipoles, 
i.e., given the location of epipoles both the affine and 
projective structure are constructed from the same infor¬ 
mation captured by four corresponding points projected 
from four non-coplanar points in space. Therefore, the 
additional corresponding points in the projective case 
are used solely for recovering the location of epipoles. 

We have shown that the location of epipoles can be 
recovered under both parallel and central projection us¬ 
ing six corresponding points, with the assumption that 
four of those points are projected from four coplanar 
points in space, or alternatively by having eight cor¬ 
responding points without assumptions on coplanarity. 
The overall method for reconstructing projective struc¬ 
ture and achieving re-projection was referred to as the 6- 
point and the 8-point algorithms. These algorithms have 
the unique property that projective structure can be re¬ 
covered from both orthographic and perspective images 
from uncalibrated cameras. This property implies, for 
instance, that we can perform recognition of a perspec¬ 
tive image of an object given two orthographic images as 




Deviation from Reference Plane 


Figure 8: Deviation from coplanarity: average pixel error due to translation of Pi along the optical axis from z = 100 
to z = 125, by increments of one unit. The result of re-projection (overlay of re-projected image and novel image) 
for the case z = 125. The average error is 1.31 and the maximal error is 7.1. 



Figure 9: Random noise added to all image points, over all views, for 10 trials. Average pixel error fluctuates around 
1.6 pixels. The result of re-projection on a typical trial with average error of 1.05 pixels, and maximal error of 5.41 
pixels. 


a model. It also implies greater stability because the size 
of the field of view is no longer an issue in the process of 
reconstructing shape or performing re-projection. 
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A Fundamental Theorem of Plane 
Projectivity 

The fundamental theorem of plane projectivity states 
that a projective transformation of the plane is com¬ 
pletely determined by four corresponding points. We 
prove the theorem by first, using a, geometric drawing, 
and then algebraically by introducing the concept, of rays 
(homogeneous coordinates). The appendix ends with the 
system of linear equations for determining the correspon¬ 
dence of all points in the plane, given four corresponding 
points (used repeatedly throughout, this paper). 

Definitions: A perspectivity between two planes is 
defined as a, central projection from one plane onto the 
other. A projectivity is defined as made out, of a, finite 
sequence of perspectivities. A projectivity, when repre¬ 
sented in an algebraic form, is called a, projective trans¬ 
formation. The fundamental theorem states that, a, pro- 


jeetivity is completely determined by four corresponding 
points. 

Geometric Illustration 

Consider the geometric drawing in Figure 11. Let 
A, B, C, U be four coplanar points in the scene, and let, 
A ', B 1 , C ,U' be their projection in the first, view, and 
A ", B",C",U" be their projection in the second view. 
By construction, the two views are project,ively related 
to each other. We further assume that, no three of the 
points are collinear (four points form a, quadrangle), and 
without, loss of generality let, U be located within the 
triangle ABC. Let, BC be the i’-axis and BA be the 
t/-a,xis. The projection of U onto the a’-axis, denoted by 
U x , is the intersection of the line AU with the a’-axis. 
Similarly U y is the intersection of the line CU with the 
t/-a,xis. because straight, lines project, onto straight, lines, 
we have that, U x , U y correspond to if arid only if U 

corresponds to U'. For any other point, P , coplanar with 
ABCU in space, its coordinates P x ,P y are constructed 
in a, similar manner. We therefore have that, B, U x , P x , C 
are collinear and therefore the cross ratio must, be equal 
to the cross ratio of B ', U' x , Pj, C, i.e. 

BC • U X P X _ B'C • UjPj 
BP X ■ U X C B'Pj-UjC 

This form of cross ratio is known as the canonical cross 






Figure 10: Random noise added to non-privileged image points, over all views, for 10 trials. Average pixel error 
fluctuates around 0.5 pixels. The result of re-projection on a typical trial with average error of 0.52 pixels, and 
maximal error of 1.61 pixels. 



Figure 11: The geometry underlying plane projectivit.y 
from four points. 

ratio. In general there are 24 cross ratios, six of which are 
numerically different (see Appendix B for more details 
on cross-ratios). Similarly, the cross ratio along the y- 
axis of the reference frame is equal to the cross ratio of 
the corresponding points in both views. 

Therefore, for any point p 1 in the first view, we con¬ 
struct its x and y locations, p' x ,p' y , along B'C' 1 and B 1 A ', 
respectively. From the equality of cross ratios we find 
the locations of p'f , p y , and that leads to p". Because 
we have used only projective constructions, i.e. straight 
lines project to straight lines, we are guaranteed that p' 
and p" are corresponding points. 


Algebraic Derivation 

From an algebraic point of view it is convenient to view 
points as laying on rays emanating from the center of 
projection. A ray representation is also called the homo¬ 
geneous coordinates representation of the plane, and is 
achieved by adding a third coordinate. Two vectors rep¬ 
resent the same point A' = (x,y, z) if they differ at most 
by a scale factor (different locations along the same ray). 
A key result, which makes this representation amenable 
to application of linear algebra to geometry, is described 
in the following proposition: 

Proposition 5 A projectivit.y of the plane is equivalent 
to a linear transformation of the homogeneous represen¬ 
tation. 

The proof is omitted here, and can be found in Tuller 
(1967, Theorems 5.22, 5.24). A projectivit.y is equiv¬ 
alent., therefore, to a. linear transformation applied to 
the rays. Because the correspondence between points 
and coordinates is not. one-to-one, we have to take scalar 
factors of proportionality into account, when represent¬ 
ing a. projective transformation. An arbitrary projective 
transformation of the plane can be represented as a. non- 
singular linear transformation (also called collineation) 
pX 1 = T A', where p is an arbitrary scale factor. 

Given four corresponding rays pj = (xj,yj,l) ■■ -- 

p'j = ( Xj , y'j , 1), we would like to find a. linear transfor¬ 
mation T and the scalars pj such that pjpl = Tpj. Note 
that because only ratios are involved, we can set. p 4 = 1. 
The following are a. basic lemma, and theorem adapted 
from Semple and Ivneebone (1952). 

Lemma 1 Ifpi, ...,p 4 are four vectors m R 3 , no three of 
which are linearly dependent, and ;/ei,...,e 4 are respec¬ 
tively the vectors (1, 0, 0), (0, 1, 0), (0, 0, 1), (1,1,1), there 
exists a non-singular linear transformation A such that 
Aej = XjPj, where the Xj are non-zero scalars; and the 
matrices of any two transformations with this property 
differ at most by a scalar factor. 

Proof: Let. pj have the components (xj , yj , 1), and with¬ 
out. loss of generality let. A 4 = 1. The matrix A satisfies 
three conditions Aej = Xjpj, j = 1,2,3 if and only if 
XjPj is the j’t.h column of ,4. Because of the fourth con- 










Figure 13: The cross-ratio of four distinct concurrent 
rays is equal to the cross-ratio of the four distinct points 
that result from intersecting the rays by a transversal. 


uniquely represented as a 2D affine transformation of the 
non-homogeneous coordinates of the points. Namely, if 
p = (x, y) and p 1 = (x 1 , y') are two corresponding points, 
then 

p' = Ap + w 

where A is a non-singular matrix and w is a vector. The 
six parameters of the transformation can be recovered 
from two non-collinear sets of thr## points, p 0 ,Pi,P 2 and 
P'o’Pl’P2- Let 


C ~ K’ 4 ~ 4 

y'i - y'oi 4 - y'o 


— x 0 , xo — x 0 


-1 


yi - y 0 , V 2 - y 0 


and w = p' 0 — Ap 0 , which together satisfy pt — p' 0 = 
A(pj — p 0 ) for j = 1,2. For any arbitrary point p on 
the plane, we have that p is spanned by the two vectors 
P 1 -P 0 and p 2 —p 0 , he., p = a 1 (p 1 -p 0 ) + a 2 (p 2 -Po)\ and 
because translation in depth is lost in parallel projection, 
we have that p 1 = aqjpj — p' 0 ) + a 2 (p 2 ~p' 0 )i and therefore 
P' ~P'o = A (P~Po)- 


B Cross-Ratio and the Linear 
Combination of Rays 

The cross-ratio of four collinear points A, B, C, D is pre¬ 
served under central projection and is defined as: 


The cross-ratio of rays is computed algebraically 
through linear combination of points in homogeneous 
coordinates (see Cans 1969, pp. 291-295), as follows. 
Let the the rays a, b, c, d be represented by vectors 
(«i, a, 2 , 03 ), ..., {d\, d 2 , c /3 ), respectively. We can repre¬ 
sent the rays a,d as linear combination of the rays 
6, c, by 

a = b + kc 
d = b + k'c 

For example, k can be found by solving the linear system 
of three equation pa = b + kc with two unknowns p, k 
(one can solve using any two of the three equations, or 
find a least squares solution using all three equations). 
We shall assume, first, that the points are Euclidean. 
The ratio in which A divides the line BC can be derived 
by: 

An “A _ N. ti _ A. 

^ ci 3 b 3 b 3 j-kc 3 b 3 7 ^*3 

AC ~ 5A — 5A _ ftl+fcci _ £± — hg 

a s c 3 b 3 +kc 3 c 3 d 

Similarly, we have jppr = —k'p^ and, therefore, the cross- 

ratio of the four rays is a = p-. The same result holds 
under more general conditions, i.e., points can be ideal 
as well: 

Proposition 6 If A,B,C,D are distinct collinear 
points, with homogeneous coordinates b+kc :, 6, c, b+k'c, 
then the canonical cross-ratio is -p. 

(for a complete proof, see Cans 1969, pp. 294-295). For 
our purposes it is sufficient to consider the case when 
one of the points, say the vector d, is ideal (i.e. d 2 = 0). 
From the vector equation pd = b + k'c, we have that 
k' = — — and, therefore, the ratio ^S. = 1. As a result, 
the cross-ratio is determined only by the first, term, i.e., 
a = pp = k — which is what we would expect if we 
represented points in the Euclidean plane and allowed 
the point D to extend to infinity along the line A, B, C, D 
(see Figure 13). 

The derivation so far can be translated directly to our 
purposes of computing the projective shape constant by 
replacing a, b , c, d with p',p',p' , V;, respectively. 

C On Epipolar Transformations 


_ AB DB _ A'B' D'B' 

° _ ~AC ~ ~DC ~ A'C ^ D'C ’ 

(see Figure 13). All permutations of the four points 
are allowed, and in general there are six distinct cross- 
ratios that can be computed from four collinear points. 
Because the cross-ratio is invariant to projection, any 
transversal meeting four distinct concurrent rays in four 
distinct points will have the same cross ratio — therefore 
one can speak of the cross-ratio of rays (concurrent or 
parallel) a, b , c, d. 

The cross-ratio result in terms of rays, rather than 
points, is appealing for the reasons that it enables the ap¬ 
plication of linear algebra (rays are represented as points 
in homogeneous coordinates), and more important, en¬ 
ables us to treat ideal points as any other point (critical 
for having an algebraic system that is well defined under 
both central and parallel projection). 


Proposition 7 The epipolar lines pV r and p'Vi are per- 
spectively related. 

Proof: Consider Figure 14. We have already estab¬ 
lished that p projects onto the left epipolar line p'Vi. 
By definition, the right epipole V r projects onto the left 
epipole Vi , therefor#,, because lines are projective invari¬ 
ants the line pV r projects onto the line p'Vi. [] 

The result that epipolar lines in one image are per- 
spectively related to the epipolar lines in the other im¬ 
age, implies that there exists a projective transformation 
F that maps epipolar lines lj onto epipolar lines It , that 
is Flj = Pjl'j, where lj = pj x V r and lj = pt x Vj. From 
the property of point/line duality of projective geome¬ 
try (Semple and Ivnsebone, 1952), the transformation 
E that maps points on left epipolar lines onto points on 
the corresponding right epipolar lines is induced from F, 
i.e., E = (F 1-1 )*. 
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Figure 14: Epipolar lines are perspectively related. 


Propositions (point/line duality) The 

transformation for projecting p onto the left epipolar line 
p'Vu is E = (F~ 1 ) t . 

Proof: Let l, l' be corresponding epipolar lines, related 
by the equation pi' = FI. Let p,p' be any two points, 
one on each epipolar line (not necessarily corresponding 
points). From the point/line incidence axiom we have 
that /* • p = 0. By substituting l we have 

[pF- 1 !'] * • p = 0 => pi'* ■ [F^p] = 0. 

Therefore, the collineation E = (F 1-1 )* maps points p 
onto the corresponding left epipolar line. [] 

It is intuitively clear that the epipolar line transforma¬ 
tion F is not unique, and therefore the induced trans¬ 
formation E is not unique either. The correspondence 
between the epipolar lines is not disturbed under trans¬ 
lation along the line V 1 V 2 , or under non-rigid camera 
motion that results from tilting the image plane with re¬ 
spect to the optical axis such that the epipole remains 
on the line Ijl'j. 

Proposition 9 The epipolar transformation F is not 
unique. 

Proof: A projective transformation is determined 

by four corresponding pencils. The transformation is 
unique (up to a scale factor) if no three of the pencils are 
linearly dependent, i.e., if the pencils are lines, then no 
three of the four lines should be coplanar. The epipolar 
line transformation F can be determined by the corre¬ 
sponding epipoles, V r ■■ -- V;, and three corresponding 

epipolar lines lj ■■ -- /'■. We show next that the epipolar 

lines are coplanar, and therefore, F cannot be deter¬ 
mined uniquely. 

Let pj and p 1 -, j = 1,2,3, be three corresponding 
points and let lj = pj x V r and /'• = p'- x \j. Let 
p 3 = api + /tpjg, a + ft = 1, be a point on the epipo¬ 
lar line p 3 V r collinear with p\,p r >. We have, 

I 3 = P 3 X V r = (ap 3 + hV r ) x V r = ap 3 x V r = aal\ + aftT , 
and similarly f 3 = o-'/j + [d'lj. [] 



Figure 15: See text. 


The epipolar transformation, therefore, has three free 
parameters (one for scale, the other two because the 
equation Fl 3 = p 3 l 3 has dropped out). 

D Affine Structure in Projective Space 

Proposition 10 The affine structure invariant, based 
on a single reference plane and a reference point, cannot 
be directly extended to central projection. 

Proof: Consider the drawing in Figure 15. Let Q be 
the reference point, P be an arbitrary point of interest 
in space, and Q,P be the projection of Q and P onto 
the reference plane (see section 4 for definition of affine 
structure under parallel projection). 

The relationship between the points P,Q,P,Q and 
the points p' ,p', q ', q 1 can be described as a perspectivit.y 
between two triangles. However, in order to establish 
an invariant between the two triangles one must have a 
coplanar point outside each of the triangles, therefore the 
five corresponding points are not sufficient for determin¬ 
ing an invariant relation (this is known as the ‘five point 
invariant’ which requires that no three of the points be 
collinear).[] 

E On the Intersection of Epipolar Lines 

Barret, et al. (1991) derive a quadratic invariant based 
on Longuet.-Higgins’ fundamental matrix. We describe 
briefly their invariant and show that it is equivalent to 
performing re-projection using intersection of epipolar 
lines. 

In section 8 we derived Longuet.-Higgins’ fundamental 
matrix relation p^Hp = 0. Barret, et al. not.ft that the 
equation can be written in vector form hf ■ q = 0, where 
h contains the elements of H and 

q = {x'xrx'yrx', y'x, y'y, y', x, y, 1). 
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Therefore, the matrix 


<Zi 


B = 


L <? 9 J 


[5] T. Broida, S. Chandrashekhar, and R. Chellapa. re¬ 
cursive 3-d motion estimation from a monocular im¬ 
age sequence. IEEE Transactions on Aerospace and 
Electronic Systems, 26:639-656, 1990. 

[6] D.C. Brown. Close-range camera calibration. Pho- 
togrammetric Engineering, 37:855-866, 1971. 


must have a vanishing determinant. Given eight corre¬ 
sponding points, the condition |B| = 0 leads to a con¬ 
straint line in terms of the coordinates of any ninth point, 
i.e., ax + [3y + 7 = 0. The location of the ninth point 
in any third view can, therefore, be determined by inter¬ 
secting the constraint lines derived from views 1 and 3, 
and views 2 and 3. 

Another way of deriving this re-projection method is 
by first noticing that H is a correlation that maps p onto 
the corresponding epipolar line /' = Vi xp' (see section 8 ). 
Therefore, from views 1 and 3 we have the relation 

p nt Ep = 0 , 

and from views 2 and 3 we have the relation 

p nt Ep' = 0 , 

where Ep and Ep' are two intersecting epipolar lines. 
Given eight corresponding points, we can recover E and 
E. The location of any ninth point p" can be recovered 
by intersecting the lines Ep and Ep'. 

This way of deriving the re-projection method has an 
advantage over using the condition |B| = 0 directly, 
because one can use more than eight points in a least 
squares solution (via SVD) for the matrices E and E. 

Approaching the re-projection problem using intersec¬ 
tion of epipolar lines is problematic for novel views that 
have a similar epipolar geometry to that of the two model 
views (these are situations where the two lines Ep and 
Ep' are nearly parallel, such as when the object rotates 
around nearly the same axis for all views). We there¬ 
fore expect sensitivity to errors also under conditions of 
small separation between views. The method becomes 
more practical if one uses multiple model views instead 
of only two, because each model views adds one epipolar 
line and all lines should intersect at the location of the 
point of interest in the novel view. 
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